7 research outputs found

    Tools for knowledge acquisition within the NeuroScholar system and their application to anatomical tract-tracing data

    Get PDF
    BACKGROUND: Knowledge bases that summarize the published literature provide useful online references for specific areas of systems-level biology that are not otherwise supported by large-scale databases. In the field of neuroanatomy, groups of small focused teams have constructed medium size knowledge bases to summarize the literature describing tract-tracing experiments in several species. Despite years of collation and curation, these databases only provide partial coverage of the available published literature. Given that the scientists reading these papers must all generate the interpretations that would normally be entered into such a system, we attempt here to provide general-purpose annotation tools to make it easy for members of the community to contribute to the task of data collation. RESULTS: In this paper, we describe an open-source, freely available knowledge management system called 'NeuroScholar' that allows straightforward structured markup of the PDF files according to a well-designed schema to capture the essential details of this class of experiment. Although, the example worked through in this paper is quite specific to neuroanatomical connectivity, the design is freely extensible and could conceivably be used to construct local knowledge bases for other experiment types. Knowledge representations of the experiment are also directly linked to the contributing textual fragments from the original research article. Through the use of this system, not only could members of the community contribute to the collation task, but input data can be gathered for automated approaches to permit knowledge acquisition through the use of Natural Language Processing (NLP). CONCLUSION: We present a functional, working tool to permit users to populate knowledge bases for neuroanatomical connectivity data from the literature through the use of structured questionnaires. This system is open-source, fully functional and available for download from [1]

    The NeuARt II system: a viewing tool for neuroanatomical data based on published neuroanatomical atlases

    Get PDF
    BACKGROUND: Anatomical studies of neural circuitry describing the basic wiring diagram of the brain produce intrinsically spatial, highly complex data of great value to the neuroscience community. Published neuroanatomical atlases provide a spatial framework for these studies. We have built an informatics framework based on these atlases for the representation of neuroanatomical knowledge. This framework not only captures current methods of anatomical data acquisition and analysis, it allows these studies to be collated, compared and synthesized within a single system. RESULTS: We have developed an atlas-viewing application ('NeuARt II') in the Java language with unique functional properties. These include the ability to use copyrighted atlases as templates within which users may view, save and retrieve data-maps and annotate them with volumetric delineations. NeuARt II also permits users to view multiple levels on multiple atlases at once. Each data-map in this system is simply a stack of vector images with one image per atlas level, so any set of accurate drawings made onto a supported atlas (in vector graphics format) could be uploaded into NeuARt II. Presently the database is populated with a corpus of high-quality neuroanatomical data from the laboratory of Dr Larry Swanson (consisting 64 highly-detailed maps of PHAL tract-tracing experiments, made up of 1039 separate drawings that were published in 27 primary research publications over 17 years). Herein we take selective examples from these data to demonstrate the features of NeuArt II. Our informatics tool permits users to browse, query and compare these maps. The NeuARt II tool operates within a bioinformatics knowledge management platform (called 'NeuroScholar') either as a standalone or a plug-in application. CONCLUSION: Anatomical localization is fundamental to neuroscientific work and atlases provide an easily-understood framework that is widely used by neuroanatomists and non-neuroanatomists alike. NeuARt II, the neuroinformatics tool presented here, provides an accurate and powerful way of representing neuroanatomical data in the context of commonly-used brain atlases for visualization, comparison and analysis. Furthermore, it provides a framework that supports the delivery and manipulation of mapped data either as a standalone system or as a component in a larger knowledge management system

    Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We address the goal of curating observations from published experiments in a generalizable form; reasoning over these observations to generate interpretations and then querying this interpreted knowledge to supply the supporting evidence. We present web-application software as part of the 'BioScholar' project (R01-GM083871) that fully instantiates this process for a well-defined domain: using tract-tracing experiments to study the neural connectivity of the rat brain.</p> <p>Results</p> <p>The main contribution of this work is to provide the first instantiation of a knowledge representation for experimental observations called 'Knowledge Engineering from Experimental Design' (KEfED) based on experimental variables and their interdependencies. The software has three parts: (a) the KEfED model editor - a design editor for creating KEfED models by drawing a flow diagram of an experimental protocol; (b) the KEfED data interface - a spreadsheet-like tool that permits users to enter experimental data pertaining to a specific model; (c) a 'neural connection matrix' interface that presents neural connectivity as a table of ordinal connection strengths representing the interpretations of tract-tracing data. This tool also allows the user to view experimental evidence pertaining to a specific connection. BioScholar is built in Flex 3.5. It uses Persevere (a <it>noSQL </it>database) as a flexible data store and PowerLoom<sup>® </sup>(a mature First Order Logic reasoning system) to execute queries using spatial reasoning over the BAMS neuroanatomical ontology.</p> <p>Conclusions</p> <p>We first introduce the KEfED approach as a general approach and describe its possible role as a way of introducing structured reasoning into models of argumentation within new models of scientific publication. We then describe the design and implementation of our example application: the BioScholar software. This is presented as a possible biocuration interface and supplementary reasoning toolkit for a larger, more specialized bioinformatics system: the Brain Architecture Management System (BAMS).</p

    Layout-aware text extraction from full-text PDF of scientific articles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications.</p> <p>Results</p> <p>Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) <b>Detecting contiguous text blocks</b> using spatial layout processing to locate and identify blocks of contiguous text, (2) <b>Classifying text blocks into rhetorical categories</b> using a rule-based method and (3) <b>Stitching classified text blocks together in the correct order</b> resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision<sup>1</sup> = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, <sup>2</sup>commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement.</p> <p>Conclusions</p> <p>LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at <url>http://code.google.com/p/lapdftext/</url>.</p
    corecore